Multiple-objective portfolio optimization¶
INTRODUCTION¶
Your task is to solve a multiple-objective portfolio optimization problem.
- Use the basic Markowitz's model from 1952 (see Lecture 1)
- Solve = construct Pareto front approximations.
- The dataset is the same as for the portfolio game part 1 (bundle1.zip).
- The dataset consists of the historical prices of 20 assets.
- The bundle contains 20 files (*.txt) linked to different assets.
- The name of the file suggests the asset's name.
- The structure of every file is as follows:
- The first line contains the name of the asset.
- The second line provides the number of data points N.
- The following N lines are data points with the structure: time, price.
- The historical timeline for all assets is time $\in$ [0,100].
- Future predictions should be calculated for time = 200.
Goal:
- Load data, make predictions, and build the model.
- Illustrate your predictions (can be done in the jupyter notebook)
- Then, implement the WSM and ECM methods (see the tutorial on quadratic programming provided below).
- Run your implementations for different calculation limits (e.g., the number of weight vectors for WSM). Compare the methods' efficiency in finding unique Pareto optimal solutions. Finally, illustrate generated Pareto fronts.
Short tutorial on the cvxopt library for quadratic programming¶
import numpy as np
from cvxopt import matrix, solvers
QP Optimization Problem¶
General model:¶
$max$ $\boldsymbol{cx} - \dfrac{1}{2}\boldsymbol{x}^T\boldsymbol{Qx}$
$s.t.$
$\boldsymbol{Gx} \leq \boldsymbol{h}$
$\boldsymbol{x} \geq \boldsymbol{0}$
But the library uses the following form:¶
$min$ $\boldsymbol{cx} + \dfrac{1}{2}\boldsymbol{x}^T\boldsymbol{Qx}$
$s.t.$
$\boldsymbol{Gx} \leq \boldsymbol{h}$
$\boldsymbol{Ax} = \boldsymbol{b}$
Exmple¶
$min$ $2x^2_1+x_2^2+x_1x_2+x_1+x_2$
$s.t.$
$x_1 \geq 0$
$x_2 \geq 0$
$x_1 + x_2 = 1$
Hence:¶
Q = matrix([ [4.0, 1.0], [1.0, 2.0] ]) ## [4, 1] is 1st column, not row!
c = matrix([1.0, 1.0]) ### (1, 2) = dimensions (1 row and 2 columns)
A = matrix([1.0, 1.0], (1,2)) ### (1, 2) = dimensions (1 row and 2 columns)
b = matrix(1.0)
G = matrix([[-1.0,0.0],[0.0,-1.0]]) ### multiplied both sides by -1
h = matrix([0.0,0.0]) ### multiplied both sides by -1
solQP=solvers.qp(Q, c, G, h, A, b)
pcost dcost gap pres dres 0: 1.8889e+00 7.7778e-01 1e+00 3e-16 2e+00 1: 1.8769e+00 1.8320e+00 4e-02 2e-16 6e-02 2: 1.8750e+00 1.8739e+00 1e-03 1e-16 5e-04 3: 1.8750e+00 1.8750e+00 1e-05 1e-16 5e-06 4: 1.8750e+00 1.8750e+00 1e-07 3e-16 5e-08 Optimal solution found.
print(solQP.keys())
dict_keys(['x', 'y', 's', 'z', 'status', 'gap', 'relative gap', 'primal objective', 'dual objective', 'primal infeasibility', 'dual infeasibility', 'primal slack', 'dual slack', 'iterations'])
print(solQP['x'])
print(solQP['primal objective'])
[ 2.50e-01] [ 7.50e-01] 1.8750000000000182
We can also solve LP problems:¶
$min$ $\boldsymbol{c}\boldsymbol{x}$
$s.t.$
$\boldsymbol{Gx} \leq \boldsymbol{h}$
$\boldsymbol{Ax} = \boldsymbol{b}$ (optional)
Exmple¶
$min$ $2x_1+x_2$
$s.t.$
$-x_1 +x_2 \leq 1$
$x_1 + x_2 \geq 2$
$x_2 \geq 0$
$x_1 - 2x_2 \leq 4$
G = matrix([ [-1.0, -1.0, 0.0, 1.0], [1.0, -1.0, -1.0, -2.0] ])
h = matrix([ 1.0, -2.0, 0.0, 4.0 ])
c = matrix([ 2.0, 1.0 ])
solLP = solvers.lp(c,G,h)
###!!!! OPTIONALLY A and b can be provided (equality constraints) as in solQP=solvers.qp(Q, c, G, h, A, b)
pcost dcost gap pres dres k/t 0: 2.6471e+00 -7.0588e-01 2e+01 8e-01 2e+00 1e+00 1: 3.0726e+00 2.8437e+00 1e+00 1e-01 2e-01 3e-01 2: 2.4891e+00 2.4808e+00 1e-01 1e-02 2e-02 5e-02 3: 2.4999e+00 2.4998e+00 1e-03 1e-04 2e-04 5e-04 4: 2.5000e+00 2.5000e+00 1e-05 1e-06 2e-06 5e-06 5: 2.5000e+00 2.5000e+00 1e-07 1e-08 2e-08 5e-08 Optimal solution found.
print(solLP.keys())
dict_keys(['x', 'y', 's', 'z', 'status', 'gap', 'relative gap', 'primal objective', 'dual objective', 'primal infeasibility', 'dual infeasibility', 'primal slack', 'dual slack', 'residual as primal infeasibility certificate', 'residual as dual infeasibility certificate', 'iterations'])
print(solLP['x'])
print(solLP['primal objective'])
[ 5.00e-01] [ 1.50e+00] 2.499999989554308
Portfolio optimization¶
Import the necessary libraries (numpy, pandas, matplotlib, cvxopt).
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from cvxopt import matrix, solvers
from sklearn.linear_model import Lasso
from scipy.signal import savgol_filter
# For reproducible randomness
np.random.seed(42)
Load the data from *Part1.txt files. Each file’s first line is the asset name, second line is N, and then N lines with “time price”.
def load_asset_data(data_folder="data"):
asset_names = []
asset_times = []
asset_prices = []
txt_files = [f for f in os.listdir(data_folder) if f.endswith("Part1.txt")]
for fname in txt_files:
path = os.path.join(data_folder, fname)
with open(path, "r") as f:
# 1) asset name
asset_name = f.readline().strip()
# 2) number of data points
N_line = f.readline().strip()
N = int(N_line)
# 3) read time, price lines
times = []
prices = []
for _ in range(N):
line = f.readline().strip()
t_str, p_str = line.split()
times.append(float(t_str))
prices.append(float(p_str))
asset_names.append(asset_name)
asset_times.append(times)
asset_prices.append(prices)
print(f"Found {len(asset_names)} assets.")
print("First few asset names:", asset_names[:5])
return asset_names, asset_times, asset_prices
asset_names, asset_times, asset_prices = load_asset_data()
Found 20 assets. First few asset names: ['SafeAndCare', 'Moneymakers', 'Fuel4', 'CPU-XYZ', 'MarsProject']
def plot_asset_forecasts(asset_names, asset_times, asset_prices, forecast_results, training_offset=100, color="red"):
"""
Plots forecast data for multiple assets on a 5x4 grid.
Parameters:
asset_names (list): List of asset names.
asset_times (list of arrays/lists): Time points for each asset.
asset_prices (list of arrays/lists): Historical prices for each asset.
forecast_results (dict): Dictionary mapping asset names to forecast results for a specific method.
Each entry for an asset should contain:
- "price_at_training": forecast price at training time.
- "price_at_forecast": forecast price at forecast time.
- "full_reconstruction": (optional) tuple (recon_times, reconstruction) for the full reconstruction curve.
- "forecast_time": (optional) forecast time (defaults to 200 if not provided).
training_offset (int): The offset between forecast time and training time (default 100).
color (str): Color for all forecast markers and lines.
The function creates a grid of subplots (max 20 assets), where for each asset:
- Historical data is plotted as black scatter points.
- A training marker is plotted at (forecast_time - training_offset) using a circle marker.
- A forecast marker is plotted at forecast_time using an 'x' marker.
- If available, a reconstruction curve is plotted.
"""
# Set up a 5x4 grid (max 20 assets)
fig, axes = plt.subplots(5, 4, figsize=(20, 25))
axes = axes.flatten()
num_assets = min(len(asset_names), 20)
for i in range(num_assets):
ax = axes[i]
asset = asset_names[i]
times = np.array(asset_times[i])
prices = np.array(asset_prices[i])
# Plot historical data
ax.scatter(times, prices, label="Historical Prices", color="black")
# Retrieve forecast result for this asset
forecast = forecast_results[asset]
forecast_time = forecast.get("forecast_time", 200)
training_time = forecast_time - training_offset
# Plot training marker
ax.scatter([training_time], [forecast["price_at_training"]],
marker="o", color=color, s=100,
label=f"Training @ t={training_time}")
# Plot forecast marker
ax.scatter([forecast_time], [forecast["price_at_forecast"]],
marker="x", color=color, s=100,
label=f"Forecast @ t={forecast_time}")
# Plot reconstruction curve if available
if "full_reconstruction" in forecast:
recon_times, reconstruction = forecast["full_reconstruction"]
ax.plot(recon_times, reconstruction, label="Reconstruction",
color=color, linewidth=2)
ax.set_title(asset)
ax.set_xlabel("Time")
ax.set_ylabel("Price")
ax.legend(fontsize='small')
# Remove any unused subplots if there are fewer than 20 assets.
for j in range(num_assets, len(axes)):
fig.delaxes(axes[j])
plt.tight_layout()
plt.show()
all_results = {}
Perform a simple linear regression (degree=1) for each asset using times in [0,100]. Extrapolate to time=200 to get a predicted price. Convert that predicted price growth into a predicted return \mu[i].
import numpy as np
from sklearn.linear_model import Lasso
from scipy.signal import savgol_filter
def baseline_forecast(asset_names, asset_times, asset_prices, training_start=0, training_end=100, forecast_time=200):
"""
Returns a dictionary with, for each asset:
- "model_predictions": (training_times, model_predictions) on [training_start, training_end]
- "full_reconstruction": (recon_times, linear_reconstruction) on [training_start, forecast_time]
- "price_at_training": predicted price at training_end
- "price_at_forecast": predicted price at forecast_time
- "predicted_return": computed return
"""
baseline_pred = {}
for i, name in enumerate(asset_names):
times = np.array(asset_times[i])
prices = np.array(asset_prices[i])
# Use training data: times between training_start and training_end
mask = (times >= training_start) & (times <= training_end)
t_used = times[mask]
p_used = prices[mask]
coeffs = np.polyfit(t_used, p_used, deg=1)
# Model predictions on training interval
model_predictions = np.polyval(coeffs, t_used)
price_at_training = np.polyval(coeffs, training_end)
price_at_forecast = np.polyval(coeffs, forecast_time)
predicted_return = (price_at_forecast - price_at_training) / price_at_training
# Generate a new time vector for full reconstruction:
recon_times = np.linspace(training_start, forecast_time, forecast_time - training_start)
linear_reconstruction = np.polyval(coeffs, recon_times)
baseline_pred[name] = {
"model_predictions": (t_used, model_predictions),
"full_reconstruction": (recon_times, linear_reconstruction),
"price_at_training": price_at_training,
"price_at_forecast": price_at_forecast,
"predicted_return": predicted_return
}
return baseline_pred
linear_results = baseline_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200)
plot_asset_forecasts(asset_names, asset_times, asset_prices, linear_results, training_offset=100, color="red")
import numpy as np
from scipy.signal import savgol_filter
from sklearn.linear_model import Lasso
def build_candidate_library(x, candidate_frequencies):
"""
Build a design matrix with a constant, linear, quadratic term,
and for each candidate frequency, sine and cosine functions.
"""
features = []
feature_names = []
features.append(np.ones_like(x))
feature_names.append('1')
features.append(x)
feature_names.append('x')
features.append(x**2)
feature_names.append('x^2')
for w in candidate_frequencies:
features.append(np.sin(2 * np.pi * w * x))
feature_names.append(f'sin(2π*{w:.2f}x)')
features.append(np.cos(2 * np.pi * w * x))
feature_names.append(f'cos(2π*{w:.2f}x)')
X = np.column_stack(features)
return X, feature_names
def sparse_forecast(asset_names, asset_times, asset_prices, training_start=0, training_end=100, forecast_time=200, alpha=0.01):
"""
For each asset, perform:
1. Denoise the full signal.
2. From training data ([training_start, training_end]), perform FFT on the denoised data
to select candidate frequencies.
3. Compute the linear trend using raw training data.
4. Subtract the trend from the denoised training data.
5. Build a candidate library over the full time series.
6. Fit a sparse (LASSO) model on the detrended training data.
7. Reconstruct the full signal (by adding back the extrapolated trend).
8. Evaluate the learned function at any new x using the candidate library.
"""
sparse_pred = {}
for i, name in enumerate(asset_names):
times = np.array(asset_times[i])
prices = np.array(asset_prices[i])
# 1. Denoise the full signal.
prices_denoised = savgol_filter(prices, window_length=11, polyorder=2)
# 2. Restrict to training data and perform FFT on the denoised training data.
mask = (times >= training_start) & (times <= training_end)
t_train = times[mask]
p_train_denoised = prices_denoised[mask]
N = len(t_train)
T = t_train[1] - t_train[0] if N > 1 else 1
fft_vals = np.fft.fft(p_train_denoised)
freq = np.fft.fftfreq(N, T)
pos_mask = freq > 0
freq_pos = freq[pos_mask]
fft_magnitude = np.abs(fft_vals)[pos_mask]
threshold = np.mean(fft_magnitude) + 1 * np.std(fft_magnitude)
candidate_frequencies = freq_pos[fft_magnitude > threshold]
candidate_frequencies = np.unique(np.round(candidate_frequencies, 2))
# 3. Compute trend using raw training data (to preserve the true slope).
p_train_raw = prices[mask]
coeffs_trend = np.polyfit(t_train, p_train_raw, 1)
trend_train = np.polyval(coeffs_trend, t_train)
# 4. Subtract the trend from the denoised training data.
p_train_detrended = p_train_denoised - trend_train
# 5. Build candidate library over the full time series.
X, _ = build_candidate_library(times, candidate_frequencies)
X_train = X[mask, :]
# 6. Fit sparse regression (LASSO) on detrended training data.
lasso = Lasso(alpha=alpha, max_iter=10000)
lasso.fit(X_train, p_train_detrended)
coefs = lasso.coef_
intercept_sparse = lasso.intercept_
# 7. Reconstruct the full signal.
p_detrended_reconstructed = intercept_sparse + X.dot(coefs)
trend_full = np.polyval(coeffs_trend, times)
p_reconstructed = p_detrended_reconstructed + trend_full
# Clip negative values in the training reconstruction:
p_reconstructed = np.maximum(p_reconstructed, 0)
# 8. Define a prediction function that evaluates the learned function at any new x.
def predict_new(x_new):
X_new, _ = build_candidate_library(np.array([x_new]), candidate_frequencies)
value = intercept_sparse + X_new.dot(coefs) + np.polyval(coeffs_trend, x_new)
# Return non-negative prediction.
return max(value.item(), 0)
price_at_training = predict_new(training_end)
price_at_forecast = predict_new(forecast_time)
predicted_return = (price_at_forecast - price_at_training) / price_at_training
# Generate a new reconstruction time vector from training_start to forecast_time.
recon_times = np.linspace(training_start, forecast_time, forecast_time - training_start)
X_new, _ = build_candidate_library(recon_times, candidate_frequencies)
p_detrended_new = intercept_sparse + X_new.dot(coefs)
p_reconstructed_new = p_detrended_new + np.polyval(coeffs_trend, recon_times)
# Clip negative values.
p_reconstructed_new = np.maximum(p_reconstructed_new, 0)
sparse_pred[name] = {
"model_predictions": (t_train, np.maximum(p_reconstructed[mask], 0)), # for diagnostic on training data
"full_reconstruction": (recon_times, p_reconstructed_new),
"price_at_training": price_at_training,
"price_at_forecast": price_at_forecast,
"predicted_return": predicted_return
}
return sparse_pred
# Example usage:
sparse_results = sparse_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200)
plot_asset_forecasts(asset_names, asset_times, asset_prices, sparse_results, training_offset=100, color="red")
Illustrate your predictions by plotting the historical data and the linear fit for a few assets, plus the forecasted price at t=200.
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
def arima_forecast(asset_names, asset_times, asset_prices, training_start=0, training_end=100, forecast_time=200, order=(1,1,1), use_log=True):
"""
For each asset, fit an ARIMA model on training data ([training_start, training_end])
and forecast up to forecast_time.
If use_log is True, a logarithmic transformation is applied to the training data,
and forecasts are exponentiated to ensure all predictions are positive.
Computes expected return as:
(price_at_forecast - price_at_training) / price_at_training.
Returns a dictionary with, for each asset:
- "model_predictions": (training_times, in-sample predictions) on [training_start, training_end]
- "full_reconstruction": (recon_times, arima_predictions) on [training_start, forecast_time]
- "price_at_training": predicted price at training_end (last training prediction)
- "price_at_forecast": predicted price at forecast_time (last forecast value)
- "predicted_return": computed return
"""
arima_pred = {}
for i, name in enumerate(asset_names):
times = np.array(asset_times[i])
prices = np.array(asset_prices[i])
# Select training data based on the provided time window.
mask = (times >= training_start) & (times <= training_end)
t_train = times[mask]
p_train = prices[mask]
if use_log:
# Check that all training prices are positive.
if np.any(p_train <= 0):
raise ValueError(f"Asset {name} has non-positive prices; cannot use log transform.")
p_train_trans = np.log(p_train)
else:
p_train_trans = p_train.copy()
# Fit the ARIMA model on the (possibly transformed) training data.
try:
model = ARIMA(p_train_trans, order=order)
model_fit = model.fit()
except Exception as e:
print(f"ARIMA fit failed for asset {name}: {e}")
continue
# In-sample prediction on the training data.
pred_train_trans = model_fit.predict(start=0, end=len(p_train_trans)-1)
# If using log transform, exponentiate predictions.
if use_log:
pred_train = np.exp(pred_train_trans)
else:
pred_train = pred_train_trans
# Determine forecast steps.
forecast_steps = int(forecast_time - training_end)
if forecast_steps <= 0:
raise ValueError("forecast_time must be greater than training_end")
# Forecast future values.
forecast_trans = model_fit.forecast(steps=forecast_steps)
if use_log:
forecast = np.exp(forecast_trans)
else:
forecast = forecast_trans
# Construct full time series reconstruction.
forecast_times = np.arange(training_end + 1, forecast_time + 1)
full_times = np.concatenate([t_train, forecast_times])
full_predictions = np.concatenate([pred_train, forecast])
# Get the predicted price at training_end and forecast_time.
price_at_training = pred_train[-1]
price_at_forecast = forecast[-1]
predicted_return = (price_at_forecast - price_at_training) / price_at_training
# In case any residual negatives appear (shouldn't happen with log-transform):
full_predictions = np.maximum(full_predictions, 0)
arima_pred[name] = {
"model_predictions": (t_train, pred_train),
"full_reconstruction": (full_times, full_predictions),
"price_at_training": price_at_training,
"price_at_forecast": price_at_forecast,
"predicted_return": predicted_return
}
return arima_pred
# Example usage:
# (Ensure asset_names, asset_times, asset_prices, and plot_asset_forecasts are defined.)
arima_results = arima_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200,
order=(2,1,2), use_log=True)
plot_asset_forecasts(asset_names, asset_times, asset_prices, arima_results, training_offset=100, color="red")
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
warn('Non-stationary starting autoregressive parameters'
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/tsa/statespace/sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
warn('Non-invertible starting MA parameters found.'
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
import numpy as np
from statsmodels.tsa.holtwinters import ExponentialSmoothing
def exponential_smoothing_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200,
trend='add', seasonal=None, seasonal_periods=None,
use_multiplicative=False):
"""
Returns:
A dictionary where each key is an asset name and each value is a dictionary containing:
- "model_predictions": (training_times, in-sample predictions) on [training_start, training_end]
- "full_reconstruction": (recon_times, exponential smoothing predictions) on [training_start, forecast_time]
- "price_at_training": predicted price at training_end (last in-sample prediction)
- "price_at_forecast": predicted price at forecast_time (last forecast value)
- "predicted_return": computed return as (price_at_forecast - price_at_training) / price_at_training
"""
exp_pred = {}
for i, name in enumerate(asset_names):
times = np.array(asset_times[i])
prices = np.array(asset_prices[i])
# Select training data based on the provided time window.
mask = (times >= training_start) & (times <= training_end)
t_train = times[mask]
p_train = prices[mask]
# Optionally, use multiplicative components if data are strictly positive.
if use_multiplicative:
if np.any(p_train <= 0):
print(f"Warning: Asset {name} has non-positive values. Multiplicative model not appropriate. Using additive model.")
else:
trend = 'mul'
if seasonal is not None:
seasonal = 'mul'
# Fit the exponential smoothing model.
try:
model = ExponentialSmoothing(p_train, trend=trend, seasonal=seasonal, seasonal_periods=seasonal_periods)
model_fit = model.fit()
except Exception as e:
print(f"Exponential Smoothing model fit failed for asset {name}: {e}")
continue
# In-sample fitted values (for training period).
pred_train = model_fit.fittedvalues
# Clip negative values.
pred_train = np.maximum(pred_train, 0)
# Determine number of forecast steps.
forecast_steps = int(forecast_time - training_end)
if forecast_steps <= 0:
raise ValueError("forecast_time must be greater than training_end")
# Forecast future values.
forecast_values = model_fit.forecast(steps=forecast_steps)
forecast_values = np.maximum(forecast_values, 0)
# Generate a time vector for the forecast period.
forecast_times = np.arange(training_end + 1, forecast_time + 1)
# Build the full reconstruction time series.
full_times = np.concatenate([t_train, forecast_times])
full_predictions = np.concatenate([pred_train, forecast_values])
full_predictions = np.maximum(full_predictions, 0) # ensure non-negative
# Get the predicted prices at training_end and forecast_time.
price_at_training = pred_train[-1]
price_at_forecast = forecast_values[-1]
predicted_return = (price_at_forecast - price_at_training) / price_at_training
exp_pred[name] = {
"model_predictions": (t_train, pred_train),
"full_reconstruction": (full_times, full_predictions),
"price_at_training": price_at_training,
"price_at_forecast": price_at_forecast,
"predicted_return": predicted_return
}
return exp_pred
exp_results = exponential_smoothing_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200,
trend='add', seasonal=None, seasonal_periods=None,
use_multiplicative=True)
plot_asset_forecasts(asset_names, asset_times, asset_prices, exp_results, training_offset=100, color="red")
import numpy as np
import pandas as pd
from sktime.forecasting.naive import NaiveForecaster
def naive_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200, strategy="last"):
"""
Returns:
A dictionary where for each asset the following keys are returned:
- "model_predictions": (training_times, in-sample naive predictions) on [training_start, training_end]
- "full_reconstruction": (recon_times, naive forecast predictions) on [training_start, forecast_time]
- "price_at_training": reference price at training_end (from actual training data)
- "price_at_forecast": forecasted price at forecast_time
- "predicted_return": computed return as (price_at_forecast - price_at_training) / price_at_training
"""
naive_pred = {}
for i, name in enumerate(asset_names):
times = np.array(asset_times[i])
prices = np.array(asset_prices[i])
# Select training data: times between training_start and training_end.
mask = (times >= training_start) & (times <= training_end)
t_train = times[mask]
p_train = prices[mask]
# Compute in-sample naive predictions.
if strategy == "last":
in_sample_predictions = np.empty_like(p_train)
in_sample_predictions[0] = p_train[0]
if len(p_train) > 1:
in_sample_predictions[1:] = p_train[:-1]
elif strategy == "drift":
if len(p_train) > 1:
drift = (p_train[-1] - p_train[0]) / (len(p_train) - 1)
else:
drift = 0
in_sample_predictions = np.empty_like(p_train)
in_sample_predictions[0] = p_train[0]
for j in range(1, len(p_train)):
in_sample_predictions[j] = p_train[j-1] + drift
elif strategy == "mean":
mean_value = np.mean(p_train)
in_sample_predictions = np.full_like(p_train, fill_value=mean_value)
else:
raise ValueError("Unsupported strategy. Use 'last', 'drift', or 'mean'.")
# Clip negative in-sample predictions.
in_sample_predictions = np.maximum(in_sample_predictions, 0)
# Determine the number of forecast steps.
forecast_steps = int(forecast_time - training_end)
if forecast_steps <= 0:
raise ValueError("forecast_time must be greater than training_end")
# Prepare the training series for sktime.
# Assuming training times are consecutive integers.
y_train = pd.Series(p_train, index=pd.RangeIndex(start=int(t_train[0]),
stop=int(t_train[0]) + len(t_train)))
# Fit the NaiveForecaster from sktime.
forecaster = NaiveForecaster(strategy=strategy)
forecaster.fit(y_train)
fh = np.arange(1, forecast_steps + 1) # forecasting horizon (steps ahead)
y_forecast = forecaster.predict(fh)
# Clip forecast predictions to ensure non-negativity.
forecast_values = np.maximum(y_forecast.values, 0)
# Create forecast time points (assuming integer time steps).
forecast_times = np.arange(training_end + 1, forecast_time + 1)
# Full reconstruction: combine in-sample predictions and forecast values.
full_times = np.concatenate([t_train, forecast_times])
full_predictions = np.concatenate([in_sample_predictions, forecast_values])
# Determine prices at training_end and forecast_time.
price_at_training = p_train[-1] # using actual last training price as reference
price_at_forecast = forecast_values[-1]
predicted_return = (price_at_forecast - price_at_training) / price_at_training
naive_pred[name] = {
"model_predictions": (t_train, in_sample_predictions),
"full_reconstruction": (full_times, full_predictions),
"price_at_training": price_at_training,
"price_at_forecast": price_at_forecast,
"predicted_return": predicted_return
}
return naive_pred
# Example usage:
naive_results = naive_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200, strategy="last")
plot_asset_forecasts(asset_names, asset_times, asset_prices, naive_results, training_offset=100, color="green")
import numpy as np
from sklearn.tree import DecisionTreeRegressor
def decision_tree_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200,
seasonal_period=None, max_depth=None):
"""
For each asset, forecast prices using a Decision Tree Regressor with conditional deseasonalization and detrending.
Steps for each asset:
1. Extract training data from training_start to training_end.
2. Fit a linear trend (using np.polyfit) to the training data.
3. Detrend the training data by subtracting the trend.
4. If seasonal_period is provided and > 1, compute a seasonal component (average residual per seasonal position)
and remove it (deseasonalize the data).
5. Fit a Decision Tree Regressor on the deseasonalized residuals.
6. For forecasting, predict the residuals on new time points, add back the trend (evaluated using the linear model)
and the seasonal component (if available) to obtain the final forecast.
"""
dt_pred = {}
for i, name in enumerate(asset_names):
times = np.array(asset_times[i])
prices = np.array(asset_prices[i])
# 1. Select training data
mask = (times >= training_start) & (times <= training_end)
t_train = times[mask]
p_train = prices[mask]
# 2. Fit a linear trend for detrending (using a first-degree polynomial)
coeffs = np.polyfit(t_train, p_train, deg=1)
trend_train = np.polyval(coeffs, t_train)
# 3. Detrend the training data
p_detrended = p_train - trend_train
# 4. Conditional deseasonalization if a seasonal period is provided
if seasonal_period is not None and seasonal_period > 1:
# Compute the average seasonal effect for each season position
seasonal_effect = np.zeros(seasonal_period)
count = np.zeros(seasonal_period)
for idx, t in enumerate(t_train):
pos = int((t - training_start) % seasonal_period)
seasonal_effect[pos] += p_detrended[idx]
count[pos] += 1
# Avoid division by zero and compute average
seasonal_effect = np.where(count > 0, seasonal_effect / count, 0)
# Build the seasonal component for each training time
seasonal_train = np.array([seasonal_effect[int((t - training_start) % seasonal_period)]
for t in t_train])
# Remove seasonal effect from the detrended data
p_deseasonalized = p_detrended - seasonal_train
else:
p_deseasonalized = p_detrended
seasonal_train = np.zeros_like(p_train) # no seasonal component
# 5. Fit Decision Tree Regressor on the deseasonalized (residual) training data.
dt_reg = DecisionTreeRegressor(max_depth=max_depth)
dt_reg.fit(t_train.reshape(-1, 1), p_deseasonalized)
# In-sample prediction on training data (residual prediction)
pred_deseasonalized_train = dt_reg.predict(t_train.reshape(-1, 1))
# Reconstruct training predictions by adding back the trend and seasonal component (if any)
pred_train_reconstructed = pred_deseasonalized_train + trend_train
if seasonal_period is not None and seasonal_period > 1:
pred_train_reconstructed += seasonal_train
# 6. Forecasting for times after training_end
forecast_times = np.arange(training_end + 1, forecast_time + 1)
# Predict the residuals for forecast times
pred_deseasonalized_forecast = dt_reg.predict(forecast_times.reshape(-1, 1))
# Evaluate trend for forecast times
trend_forecast = np.polyval(coeffs, forecast_times)
# If seasonal, compute seasonal component for forecast times
if seasonal_period is not None and seasonal_period > 1:
seasonal_forecast = np.array([seasonal_effect[int((t - training_start) % seasonal_period)]
for t in forecast_times])
else:
seasonal_forecast = np.zeros_like(forecast_times)
# Reconstruct forecasted predictions
forecast_pred = pred_deseasonalized_forecast + trend_forecast + seasonal_forecast
# 7. Build full reconstruction of predictions (training + forecast)
full_times = np.concatenate([t_train, forecast_times])
full_predictions = np.concatenate([pred_train_reconstructed, forecast_pred])
# Determine price at training_end (last training prediction) and at forecast_time (last forecast value)
price_at_training = pred_train_reconstructed[-1]
price_at_forecast = forecast_pred[-1]
predicted_return = (price_at_forecast - price_at_training) / price_at_training
dt_pred[name] = {
"model_predictions": (t_train, pred_train_reconstructed),
"full_reconstruction": (full_times, full_predictions),
"price_at_training": price_at_training,
"price_at_forecast": price_at_forecast,
"predicted_return": predicted_return
}
return dt_pred
# Example usage:
dt_results = decision_tree_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200,
seasonal_period=50, max_depth=50)
plot_asset_forecasts(asset_names, asset_times, asset_prices, dt_results, training_offset=100, color="orange")
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, SimpleRNN
def create_dataset(data, look_back):
"""
Create sliding-window sequences for training.
Args:
data: 1D numpy array of normalized asset prices.
look_back: number of time steps to use as input.
Returns:
X: numpy array of shape (n_samples, look_back)
y: numpy array of shape (n_samples,)
"""
X, y = [], []
for i in range(len(data) - look_back):
X.append(data[i:i+look_back])
y.append(data[i+look_back])
return np.array(X), np.array(y)
def rnn_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200,
look_back=10, epochs=20, batch_size=64):
"""
For each asset, this function:
1. Selects training data in [training_start, training_end].
2. Computes the min and max of the training data and normalizes it.
3. Creates sliding-window sequences and trains an RNN model on the normalized data.
4. Makes in-sample predictions and recursively forecasts until forecast_time (all in normalized space).
5. Transforms the predictions back to the original scale using the stored min and max.
6. Clipping: Any negative values in the predictions (both in-sample and forecast) are set to 0.
"""
rnn_pred = {}
for i, name in enumerate(asset_names):
times = np.array(asset_times[i])
prices = np.array(asset_prices[i])
# Select training data using the provided time window.
mask = (times >= training_start) & (times <= training_end)
t_train = times[mask]
p_train = prices[mask]
if len(p_train) <= look_back:
raise ValueError(f"Not enough training data for asset {name} with look_back = {look_back}.")
# Get the min and max from the training data (original scale).
train_min = p_train.min()
train_max = p_train.max()
# Normalize the training data.
p_train_norm = (p_train - train_min) / (train_max - train_min)
# Create sliding-window sequences on the normalized data.
X_train, y_train = create_dataset(p_train_norm, look_back)
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
# Build the RNN model using the specified architecture.
model = Sequential()
model.add(SimpleRNN(units=50, return_sequences=True, input_shape=(X_train.shape[1], 1)))
model.add(SimpleRNN(units=50, return_sequences=False))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error')
# Train the RNN model on normalized data (verbose=0 to suppress output)
model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=0)
# In-sample predictions on the normalized training data.
pred_train_norm = model.predict(X_train, verbose=0).flatten()
# Inverse transform the normalized predictions to the original scale.
pred_train = pred_train_norm * (train_max - train_min) + train_min
# Clip any negative predictions to 0.
pred_train = np.maximum(pred_train, 0)
# The in-sample predictions correspond to times t_train[look_back:].
t_train_pred = t_train[look_back:]
# Determine forecast steps (assumes integer time steps).
forecast_steps = int(forecast_time - training_end)
if forecast_steps <= 0:
raise ValueError("forecast_time must be greater than training_end")
# Recursive forecasting: use the last look_back normalized values to predict the next value.
last_sequence = p_train_norm[-look_back:].tolist()
forecasted_norm = []
for _ in range(forecast_steps):
input_seq = np.array(last_sequence[-look_back:]).reshape((1, look_back, 1))
next_val_norm = model.predict(input_seq, verbose=0)[0, 0]
forecasted_norm.append(next_val_norm)
last_sequence.append(next_val_norm)
# Inverse-transform the forecasted normalized predictions.
forecasted = np.array(forecasted_norm) * (train_max - train_min) + train_min
# Clip negative forecast values.
forecasted = np.maximum(forecasted, 0)
# Construct full reconstruction: combine in-sample predictions and forecasted values.
forecast_times = np.arange(training_end + 1, forecast_time + 1)
full_times = np.concatenate([t_train_pred, forecast_times])
full_predictions = np.concatenate([pred_train, forecasted])
full_predictions = np.maximum(full_predictions, 0)
price_at_training = pred_train[-1]
price_at_forecast = forecasted[-1]
predicted_return = (price_at_forecast - price_at_training) / price_at_training
rnn_pred[name] = {
"model_predictions": (t_train_pred, pred_train),
"full_reconstruction": (full_times, full_predictions),
"price_at_training": price_at_training,
"price_at_forecast": price_at_forecast,
"predicted_return": predicted_return
}
return rnn_pred
# (Assuming asset_names, asset_times, asset_prices, and plot_asset_forecasts are defined)
rnn_results = rnn_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200,
look_back=30, epochs=20, batch_size=64)
plot_asset_forecasts(asset_names, asset_times, asset_prices, rnn_results, training_offset=100, color="brown")
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/keras/src/layers/rnn/rnn.py:200: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead. super().__init__(**kwargs)
import numpy as np
import matplotlib.pyplot as plt
def aggregate_and_plot_mean_median(asset_names, asset_times, asset_prices,
rnn_results, dt_results, naive_results, exp_results,
arima_results, sparse_results, linear_results,
training_start=0, training_end=100, forecast_time=200):
"""
For each asset, this function:
- Prints a table of predicted training price, forecast price, and expected return (in %)
for each forecasting method.
- Computes the pointwise median and mean forecasts across methods.
- Displays two separate 5×4 grids: one for the median forecast and one for the mean forecast.
Expected return is computed as (forecast / training_price)*100%.
Returns a dictionary with two keys: "median" and "mean". Each value is a dictionary where each asset
is represented with keys:
- "model_predictions": (training_times, aggregated in-sample predictions) – here we fill the training period with the aggregated training price.
- "full_reconstruction": (common_times, aggregated full forecast)
- "price_at_training": aggregated training price (scalar)
- "price_at_forecast": aggregated forecast price (scalar)
- "predicted_return": aggregated expected return (scalar)
"""
# Combine forecast dictionaries.
methods = {
'RNN': rnn_results,
'DecisionTree': dt_results,
'Naive': naive_results,
'ExpSmoothing': exp_results,
'ARIMA': arima_results,
'Sparse': sparse_results,
'Linear': linear_results
}
# Create a common time axis.
common_times = np.linspace(training_start, forecast_time, forecast_time - training_start)
# Containers for aggregated results.
median_results = {}
mean_results = {}
# Loop over assets.
for idx, asset in enumerate(asset_names):
print(f"\n{'='*60}\nAsset: {asset}")
# Get asset training data.
times = np.array(asset_times[idx])
prices = np.array(asset_prices[idx])
mask = (times >= training_start) & (times <= training_end)
t_train = times[mask]
real_training_price = prices[mask][-1]
print(f"Real training price (last value in training data): {real_training_price:.4f}")
# Initialize lists to collect scalar predictions and aligned full forecasts.
training_preds = [] # from each method: price_at_training
forecast_preds = [] # from each method: price_at_forecast
exp_returns = [] # expected return in percentage
full_predictions_aligned = [] # each element is an array aligned on common_times
# Print table header.
header = f"{'Method':<15} {'TrainPred':>10} {'Forecast':>10} {'Return (%)':>12}"
print(header)
print("-" * len(header))
for method_name, result in methods.items():
if asset not in result:
continue
res = result[asset]
train_pred = res["price_at_training"]
forecast_pred = res["price_at_forecast"]
# Expected return = (forecast / train_pred)*100%
exp_return = (forecast_pred / train_pred) * 100.0
print(f"{method_name:<15} {train_pred:10.4f} {forecast_pred:10.4f} {exp_return:12.2f}")
training_preds.append(train_pred)
forecast_preds.append(forecast_pred)
exp_returns.append(exp_return)
# Interpolate full reconstruction on common_times.
method_times, method_preds = res["full_reconstruction"]
aligned = np.interp(common_times, method_times, method_preds)
full_predictions_aligned.append(aligned)
# Compute aggregated scalar values.
median_train = np.median(training_preds)
median_forecast = np.median(forecast_preds)
median_return = np.median(exp_returns)
mean_train = np.mean(training_preds)
mean_forecast = np.mean(forecast_preds)
mean_return = np.mean(exp_returns)
print("-" * len(header))
print(f"{'Median':<15} {median_train:10.4f} {median_forecast:10.4f} {median_return:12.2f}")
print(f"{'Mean':<15} {mean_train:10.4f} {mean_forecast:10.4f} {mean_return:12.2f}")
# Compute pointwise aggregated forecasts.
full_predictions_aligned = np.array(full_predictions_aligned)
median_full = np.median(full_predictions_aligned, axis=0)
mean_full = np.mean(full_predictions_aligned, axis=0)
# Save aggregated results for this asset.
# For in-sample predictions, we simply fill the training period with the aggregated training price.
median_results[asset] = {
"model_predictions": (t_train, np.full_like(t_train, median_train)),
"full_reconstruction": (common_times, median_full),
"price_at_training": median_train,
"price_at_forecast": median_forecast,
"predicted_return": median_return
}
mean_results[asset] = {
"model_predictions": (t_train, np.full_like(t_train, mean_train)),
"full_reconstruction": (common_times, mean_full),
"price_at_training": mean_train,
"price_at_forecast": mean_forecast,
"predicted_return": mean_return
}
# Plotting: Two separate 5x4 charts.
n_assets = len(asset_names)
n_rows, n_cols = 5, 4
# Figure for Median forecasts.
fig_med, axes_med = plt.subplots(n_rows, n_cols, figsize=(20, 25), sharex=True, sharey=False)
axes_med = axes_med.flatten()
for idx, asset in enumerate(asset_names):
ax = axes_med[idx]
# Plot aggregated median forecast.
ax.plot(common_times, median_results[asset]["full_reconstruction"][1], label="Median Forecast", color='black', linewidth=2)
# Plot each method's forecast.
for method_name, result in methods.items():
if asset in result:
m_times, m_preds = result[asset]["full_reconstruction"]
aligned = np.interp(common_times, m_times, m_preds)
ax.plot(common_times, aligned, label=method_name, linestyle="--", alpha=0.6)
# Plot real training data.
mask = (np.array(asset_times[asset_names.index(asset)]) >= training_start) & (np.array(asset_times[asset_names.index(asset)]) <= training_end)
t_train = np.array(asset_times[asset_names.index(asset)])[mask]
ax.plot(t_train, np.array(asset_prices[asset_names.index(asset)])[mask], label="Real Training", color="blue", linewidth=2)
ax.set_title(asset)
ax.set_xlabel("Time")
ax.set_ylabel("Price")
ax.legend(fontsize=8)
# Hide any unused subplots.
for j in range(idx+1, n_rows*n_cols):
axes_med[j].axis('off')
fig_med.tight_layout()
# Figure for Mean forecasts.
fig_mean, axes_mean = plt.subplots(n_rows, n_cols, figsize=(20, 25), sharex=True, sharey=False)
axes_mean = axes_mean.flatten()
for idx, asset in enumerate(asset_names):
ax = axes_mean[idx]
# Plot aggregated mean forecast.
ax.plot(common_times, mean_results[asset]["full_reconstruction"][1], label="Mean Forecast", color='black', linewidth=2)
# Plot each method's forecast.
for method_name, result in methods.items():
if asset in result:
m_times, m_preds = result[asset]["full_reconstruction"]
aligned = np.interp(common_times, m_times, m_preds)
ax.plot(common_times, aligned, label=method_name, linestyle="--", alpha=0.6)
# Plot real training data.
mask = (np.array(asset_times[asset_names.index(asset)]) >= training_start) & (np.array(asset_times[asset_names.index(asset)]) <= training_end)
t_train = np.array(asset_times[asset_names.index(asset)])[mask]
ax.plot(t_train, np.array(asset_prices[asset_names.index(asset)])[mask], label="Real Training", color="blue", linewidth=2)
ax.set_title(asset)
ax.set_xlabel("Time")
ax.set_ylabel("Price")
ax.legend(fontsize=8)
for j in range(idx+1, n_rows*n_cols):
axes_mean[j].axis('off')
fig_mean.tight_layout()
plt.show()
# Return the aggregated results.
return {"median": median_results, "mean": mean_results}
results = aggregate_and_plot_mean_median(asset_names, asset_times, asset_prices,
rnn_results, dt_results, naive_results, exp_results,
arima_results, sparse_results, linear_results,
training_start=0, training_end=100, forecast_time=200)
============================================================ Asset: SafeAndCare Real training price (last value in training data): 6.3601 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 6.3818 6.6640 104.42 DecisionTree 6.3601 5.4799 86.16 Naive 6.3601 6.3601 100.00 ExpSmoothing 6.3384 5.0240 79.26 ARIMA 6.3513 6.0570 95.37 Sparse 6.2786 7.8440 124.93 Linear 6.4551 5.5750 86.36 -------------------------------------------------- Median 6.3601 6.0570 95.37 Mean 6.3608 6.1434 96.64 ============================================================ Asset: Moneymakers Real training price (last value in training data): 4.1005 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 4.4788 4.5579 101.76 DecisionTree 4.1005 6.0619 147.83 Naive 4.1005 4.1005 100.00 ExpSmoothing 4.0759 0.4886 11.99 ARIMA 4.1003 2.9480 71.90 Sparse 4.2734 10.0247 234.58 Linear 4.5796 6.5410 142.83 -------------------------------------------------- Median 4.1005 4.5579 101.76 Mean 4.2442 4.9604 115.84 ============================================================ Asset: Fuel4 Real training price (last value in training data): 6.0939 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 6.1473 7.5696 123.14 DecisionTree 6.0939 3.9304 64.50 Naive 6.0939 6.0939 100.00 ExpSmoothing 6.1702 4.5776 74.19 ARIMA 6.1637 5.4801 88.91 Sparse 6.1140 3.0142 49.30 Linear 5.6435 3.4800 61.66 -------------------------------------------------- Median 6.1140 4.5776 74.19 Mean 6.0609 4.8780 80.24 ============================================================ Asset: CPU-XYZ Real training price (last value in training data): 8.1245 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 8.3556 11.8942 142.35 DecisionTree 8.1245 7.9867 98.30 Naive 8.1245 8.1245 100.00 ExpSmoothing 8.3347 10.8390 130.05 ARIMA 8.3324 8.4699 101.65 Sparse 8.2151 13.5739 165.23 Linear 6.7088 6.5710 97.95 -------------------------------------------------- Median 8.2151 8.4699 101.65 Mean 8.0279 9.6370 119.36 ============================================================ Asset: MarsProject Real training price (last value in training data): 3.4077 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 3.4145 2.8495 83.45 DecisionTree 3.4077 3.7016 108.63 Naive 3.4077 3.4077 100.00 ExpSmoothing 3.6200 2.7465 75.87 ARIMA 3.6997 3.7121 100.34 Sparse 3.6921 7.7155 208.97 Linear 3.6912 3.9852 107.96 -------------------------------------------------- Median 3.6200 3.7016 100.34 Mean 3.5618 4.0169 112.17 ============================================================ Asset: WorldNow Real training price (last value in training data): 8.1922 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 8.2146 8.1388 99.08 DecisionTree 8.1922 11.5655 141.18 Naive 8.1922 8.1922 100.00 ExpSmoothing 8.2196 8.2684 100.59 ARIMA 8.2129 8.1328 99.02 Sparse 8.3721 17.1030 204.29 Linear 7.9307 11.3040 142.53 -------------------------------------------------- Median 8.2129 8.2684 100.59 Mean 8.1906 10.3864 126.67 ============================================================ Asset: Photons Real training price (last value in training data): 6.3065 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 6.0941 1.9155 31.43 DecisionTree 6.3065 8.1080 128.56 Naive 6.3065 6.3065 100.00 ExpSmoothing 6.3002 20.4726 324.95 ARIMA 6.2983 12.9393 205.44 Sparse 6.0467 0.6724 11.12 Linear 6.7308 8.5322 126.76 -------------------------------------------------- Median 6.3002 8.1080 126.76 Mean 6.2976 8.4209 132.61 ============================================================ Asset: EnviroLike Real training price (last value in training data): 6.3693 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 6.5224 5.9498 91.22 DecisionTree 6.3693 5.1785 81.30 Naive 6.3693 6.3693 100.00 ExpSmoothing 6.6534 3.6944 55.53 ARIMA 6.6833 6.5567 98.11 Sparse 6.7181 9.7550 145.20 Linear 6.3914 5.2006 81.37 -------------------------------------------------- Median 6.5224 5.9498 91.22 Mean 6.5296 6.1006 93.25 ============================================================ Asset: BetterTechnology Real training price (last value in training data): 5.4368 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 5.1511 4.0262 78.16 DecisionTree 5.4368 4.5465 83.63 Naive 5.4368 5.4368 100.00 ExpSmoothing 5.1834 1.5362 29.64 ARIMA 5.2241 4.4289 84.78 Sparse 5.4819 1.7524 31.97 Linear 5.7359 4.8456 84.48 -------------------------------------------------- Median 5.4368 4.4289 83.63 Mean 5.3786 3.7961 70.38 ============================================================ Asset: PearPear Real training price (last value in training data): 2.5805 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 2.5912 0.2367 9.14 DecisionTree 2.5805 3.7159 144.00 Naive 2.5805 2.5805 100.00 ExpSmoothing 2.5799 0.3529 13.68 ARIMA 2.6076 0.3666 14.06 Sparse 2.5296 0.0000 0.00 Linear 4.2897 5.4252 126.47 -------------------------------------------------- Median 2.5805 0.3666 14.06 Mean 2.8227 1.8111 58.19 ============================================================ Asset: PositiveCorrelation Real training price (last value in training data): 3.4522 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 3.1966 4.0220 125.82 DecisionTree 3.4522 0.3755 10.88 Naive 3.4522 3.4522 100.00 ExpSmoothing 3.4567 16.4869 476.95 ARIMA 3.4498 4.4155 127.99 Sparse 3.1575 0.0000 0.00 Linear 3.3221 0.2455 7.39 -------------------------------------------------- Median 3.4498 3.4522 100.00 Mean 3.3553 4.1425 121.29 ============================================================ Asset: WaterForce Real training price (last value in training data): 2.8332 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 3.0084 3.5301 117.34 DecisionTree 2.8332 4.3917 155.01 Naive 2.8332 2.8332 100.00 ExpSmoothing 2.8341 0.1194 4.21 ARIMA 2.8314 1.9700 69.58 Sparse 3.0083 0.0000 0.00 Linear 3.6786 5.2370 142.37 -------------------------------------------------- Median 2.8341 2.8332 100.00 Mean 3.0039 2.5831 84.07 ============================================================ Asset: Electronics123 Real training price (last value in training data): 7.0791 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 7.1507 6.8433 95.70 DecisionTree 7.0791 5.9075 83.45 Naive 7.0791 7.0791 100.00 ExpSmoothing 7.2113 3.8513 53.41 ARIMA 7.2006 7.1242 98.94 Sparse 7.3697 6.4980 88.17 Linear 6.9725 5.8009 83.20 -------------------------------------------------- Median 7.1507 6.4980 88.17 Mean 7.1519 6.1577 86.12 ============================================================ Asset: RoboticsX Real training price (last value in training data): 4.4676 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 4.4716 5.2147 116.62 DecisionTree 4.4676 6.4657 144.72 Naive 4.4676 4.4676 100.00 ExpSmoothing 4.4328 1.4151 31.92 ARIMA 4.5043 5.4782 121.62 Sparse 4.4876 0.0000 0.00 Linear 6.0182 8.0163 133.20 -------------------------------------------------- Median 4.4716 5.2147 116.62 Mean 4.6928 4.4368 92.58 ============================================================ Asset: Apples Real training price (last value in training data): 4.0061 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 3.9539 4.3519 110.07 DecisionTree 4.0061 -0.3381 -8.44 Naive 4.0061 4.0061 100.00 ExpSmoothing 4.1689 11.1636 267.78 ARIMA 4.2264 4.3909 103.89 Sparse 3.7707 0.0000 0.00 Linear 5.1556 0.8114 15.74 -------------------------------------------------- Median 4.0061 4.0061 100.00 Mean 4.1840 3.4837 84.15 ============================================================ Asset: BetterTomorrow Real training price (last value in training data): 5.5067 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 5.5855 5.4887 98.27 DecisionTree 5.5067 6.6108 120.05 Naive 5.5067 5.5067 100.00 ExpSmoothing 5.7371 5.4312 94.67 ARIMA 5.6879 5.4010 94.95 Sparse 5.8045 10.6659 183.75 Linear 5.6146 6.7186 119.66 -------------------------------------------------- Median 5.6146 5.5067 100.00 Mean 5.6347 6.5461 115.91 ============================================================ Asset: SpaceNow Real training price (last value in training data): 5.6439 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 5.7359 5.5234 96.30 DecisionTree 5.6439 5.8384 103.45 Naive 5.6439 5.6439 100.00 ExpSmoothing 5.7533 3.4518 60.00 ARIMA 5.7685 5.6312 97.62 Sparse 5.6534 4.4997 79.59 Linear 5.6289 5.8234 103.46 -------------------------------------------------- Median 5.6534 5.6312 97.62 Mean 5.6897 5.2017 91.49 ============================================================ Asset: Lasers Real training price (last value in training data): 6.1223 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 5.7381 7.3575 128.22 DecisionTree 6.1223 3.9896 65.17 Naive 6.1223 6.1223 100.00 ExpSmoothing 5.9820 3.7861 63.29 ARIMA 6.0733 6.0858 100.20 Sparse 5.9703 0.9304 15.58 Linear 6.1939 4.0612 65.57 -------------------------------------------------- Median 6.0733 4.0612 65.57 Mean 6.0289 4.6190 76.86 ============================================================ Asset: SuperFuture Real training price (last value in training data): 3.8392 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 3.4711 0.0000 0.00 DecisionTree 3.8392 6.1281 159.62 Naive 3.8392 3.8392 100.00 ExpSmoothing 3.6543 0.2507 6.86 ARIMA 3.6090 1.6457 45.60 Sparse 3.6170 0.0000 0.00 Linear 6.1167 8.4056 137.42 -------------------------------------------------- Median 3.6543 1.6457 45.60 Mean 4.0209 2.8956 64.21 ============================================================ Asset: ABCDE Real training price (last value in training data): 3.1961 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 3.1237 2.7094 86.74 DecisionTree 3.1961 5.1690 161.73 Naive 3.1961 3.1961 100.00 ExpSmoothing 3.1663 2.2913 72.37 ARIMA 3.1499 3.0396 96.50 Sparse 3.3762 8.1929 242.66 Linear 3.0490 5.0219 164.71 -------------------------------------------------- Median 3.1663 3.1961 100.00 Mean 3.1796 4.2315 132.10
Markowitz Model using mean results¶
# Extract expected returns from mean results
expected_returns = np.array([results["mean"][asset]["predicted_return"] / 100.0 - 1 for asset in asset_names])
n_assets = len(asset_names)
# Compute covariance matrix using historical data
historical_returns = []
for i, asset in enumerate(asset_names):
# Get prices during training period
prices = np.array(asset_prices[i])
times = np.array(asset_times[i])
mask = (times >= 0) & (times <= 100)
asset_prices_train = prices[mask]
# Calculate returns
asset_returns = np.diff(asset_prices_train) / asset_prices_train[:-1]
historical_returns.append(asset_returns)
# Ensure all historical returns have the same length
min_length = min(len(returns) for returns in historical_returns)
historical_returns = [returns[:min_length] for returns in historical_returns]
historical_returns = np.array(historical_returns)
# Compute covariance matrix
cov_matrix = np.cov(historical_returns)
# Ensure covariance matrix is positive semi-definite
eigenvalues = np.linalg.eigvalsh(cov_matrix)
if np.any(eigenvalues < 0):
cov_matrix += np.eye(n_assets) * 1e-8
print("Expected Returns (sample):")
for i in range(min(5, n_assets)):
print(f"{asset_names[i]}: {expected_returns[i]:.4f}")
print("\nCovariance Matrix (sample):")
print(cov_matrix[:3, :3])
# Find minimum risk portfolio
def solve_min_risk():
"""Find the portfolio with minimum risk"""
P = matrix(cov_matrix)
q = matrix(np.zeros(n_assets))
# Constraint: w ≥ 0
G = matrix(-np.eye(n_assets))
h = matrix(np.zeros(n_assets))
# Constraint: sum(w) = 1
A = matrix(np.ones((1, n_assets)))
b = matrix(np.ones(1))
sol = solvers.qp(P, q, G, h, A, b)
weights = np.array(sol['x']).flatten()
port_return = np.dot(weights, expected_returns)
port_risk = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
return weights, port_risk, port_return
# Find maximum return portfolio
def solve_max_return():
"""Find the portfolio with maximum return"""
c = matrix(-expected_returns) # Negative because we minimize -r^T w
# Constraint: w ≥ 0
G = matrix(-np.eye(n_assets))
h = matrix(np.zeros(n_assets))
# Constraint: sum(w) = 1
A = matrix(np.ones((1, n_assets)))
b = matrix(np.ones(1))
sol = solvers.lp(c, G, h, A, b)
weights = np.array(sol['x']).flatten()
port_return = np.dot(weights, expected_returns)
port_risk = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
return weights, port_return, port_risk
# Find the extreme portfolios
min_risk_weights, min_risk_value, min_risk_return = solve_min_risk()
max_return_weights, max_return_value, max_return_risk = solve_max_return()
print("\nMinimum Risk Portfolio:")
print(f"Risk: {min_risk_value:.6f}")
print(f"Return: {min_risk_return:.6f}")
print("\nMaximum Return Portfolio:")
print(f"Return: {max_return_value:.6f}")
print(f"Risk: {max_return_risk:.6f}")
Expected Returns (sample): SafeAndCare: -0.0336 Moneymakers: 0.1584 Fuel4: -0.1976 CPU-XYZ: 0.1936 MarsProject: 0.1217 Covariance Matrix (sample): [[ 1.01854023e-04 6.69455358e-05 -1.71112855e-05] [ 6.69455358e-05 1.24950579e-03 1.45858732e-04] [-1.71112855e-05 1.45858732e-04 2.97224717e-04]] Minimum Risk Portfolio: Risk: 0.003115 Return: 0.064974 Maximum Return Portfolio: Return: 0.326109 Risk: 0.009110
Implementation of the two multi-objective methods:¶
- Weighted Sum Method (WSM)
- Epsilon-Constraint Method (ECM)
# Suppress solver output
solvers.options['show_progress'] = False
# Weighted Sum Method (WSM)
def weighted_sum_method(n_weights=10, normalize=True):
# Set up normalization ranges if needed
if normalize:
return_range = max_return_value - min_risk_return
risk_range = max_return_risk - min_risk_value
# Generate weight vectors
weight_vectors = []
for i in range(n_weights):
w_return = i / (n_weights - 1) # Weight for return
w_risk = 1 - w_return # Weight for risk
weight_vectors.append((w_return, w_risk))
# Solve for each weight vector
solutions = []
for w_return, w_risk in weight_vectors:
# Setup quadratic programming parameters
P = matrix(cov_matrix)
q = matrix(np.zeros(n_assets))
if normalize:
# For normalized objective function
q_mod = matrix(-w_return * expected_returns / return_range)
P_mod = matrix(w_risk * cov_matrix / risk_range)
else:
# For non-normalized objective function
q_mod = matrix(-w_return * expected_returns)
P_mod = matrix(w_risk * cov_matrix)
# Constraint: w ≥ 0
G = matrix(-np.eye(n_assets))
h = matrix(np.zeros(n_assets))
# Constraint: sum(w) = 1
A = matrix(np.ones((1, n_assets)))
b = matrix(np.ones(1))
# Solve QP
sol = solvers.qp(P_mod, q_mod, G, h, A, b)
if sol['status'] != 'optimal':
print(f"Warning: Optimization did not reach an optimal solution for weights {w_return}, {w_risk}. Status: {sol['status']}")
continue
weights = np.array(sol['x']).flatten()
port_return = np.dot(weights, expected_returns)
port_risk = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
solutions.append((weights, port_return, port_risk))
# Filter duplicate solutions
unique_solutions = []
for sol in solutions:
is_unique = True
for existing_sol in unique_solutions:
if (np.abs(sol[1] - existing_sol[1]) < 1e-6 and
np.abs(sol[2] - existing_sol[2]) < 1e-6):
is_unique = False
break
if is_unique:
unique_solutions.append(sol)
return unique_solutions
# Epsilon-Constraint Method (ECM)
def epsilon_constraint_method(n_thresholds=10):
"""
Implements the Epsilon-Constraint Method for portfolio optimization.
Args:
n_thresholds: Number of threshold values to use
Returns:
List of Pareto optimal solutions (weights, return, risk)
"""
# Generate threshold values
thresholds = []
for i in range(n_thresholds):
t = min_risk_return + (i / (n_thresholds - 1)) * (max_return_value - min_risk_return)
thresholds.append(t)
# Solve for each threshold
solutions = []
for threshold in thresholds:
# Setup quadratic programming parameters
P = matrix(cov_matrix)
q = matrix(np.zeros(n_assets))
# Constraint: w ≥ 0
G = matrix(-np.eye(n_assets))
h = matrix(np.zeros(n_assets))
# Constraints: sum(w) = 1 and r^T w >= threshold
A = matrix(np.vstack((
np.ones(n_assets), # sum(w) = 1
expected_returns # r^T w >= threshold
)))
b = matrix(np.array([1.0, threshold]))
# Solve QP
sol = solvers.qp(P, q, G, h, A, b)
if sol['status'] != 'optimal':
print(f"Warning: Optimization did not reach an optimal solution for threshold {threshold}. Status: {sol['status']}")
continue
weights = np.array(sol['x']).flatten()
port_return = np.dot(weights, expected_returns)
port_risk = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
solutions.append((weights, port_return, port_risk))
# Filter duplicate solutions
unique_solutions = []
for sol in solutions:
is_unique = True
for existing_sol in unique_solutions:
if (np.abs(sol[1] - existing_sol[1]) < 1e-6 and
np.abs(sol[2] - existing_sol[2]) < 1e-6):
is_unique = False
break
if is_unique:
unique_solutions.append(sol)
return unique_solutions
# Function to plot Pareto front
def plot_pareto_front(solutions, title):
risks = [sol[2] for sol in solutions]
returns = [sol[1] for sol in solutions]
plt.figure(figsize=(10, 6))
plt.scatter(risks, returns, c='red', marker='o')
plt.xlabel('Risk')
plt.ylabel('Expected Return')
plt.title(title)
plt.grid(True)
plt.show()
# Test the methods with a small number of points
wsm_solutions = weighted_sum_method(10)
ecm_solutions = epsilon_constraint_method(10)
print(f"WSM found {len(wsm_solutions)} unique solutions")
print(f"ECM found {len(ecm_solutions)} unique solutions")
plot_pareto_front(wsm_solutions, "WSM Pareto Front")
plot_pareto_front(ecm_solutions, "ECM Pareto Front")
WSM found 6 unique solutions ECM found 10 unique solutions
Experimentation:¶
- Running WSM and ECM with different numbers of sample points.
- Plottting the resulting Pareto fronts.
- Comparing how many unique solutions each approach yields.
# Run WSM and ECM with different numbers of weight vectors/thresholds
point_counts = [10, 20, 100, 1000]
wsm_results = {}
ecm_results = {}
# Define different markers for each point count
markers = {
10: 'o', # circle
20: 's', # square
100: 'd', # diamond
1000: '*' # star
}
# Colors for each point count
colors = {
10: '#1f77b4', # blue
20: '#ff7f0e', # orange
100: '#d62728', # red
1000: '#9467bd' # purple
}
# Suppress solver output
solvers.options['show_progress'] = False
for count in point_counts:
# Run WSM with normalization
wsm_solutions = weighted_sum_method(count, normalize=True)
wsm_results[count] = {
'solutions': wsm_solutions,
'unique_count': len(wsm_solutions)
}
# Run ECM
ecm_solutions = epsilon_constraint_method(count)
ecm_results[count] = {
'solutions': ecm_solutions,
'unique_count': len(ecm_solutions)
}
# Bar chart for number of unique solutions
plt.figure(figsize=(12, 6))
x = np.arange(len(point_counts))
width = 0.35
wsm_counts = [wsm_results[c]['unique_count'] for c in point_counts]
ecm_counts = [ecm_results[c]['unique_count'] for c in point_counts]
plt.bar(x - width/2, wsm_counts, width, label='WSM', color='royalblue')
plt.bar(x + width/2, ecm_counts, width, label='ECM', color='tomato')
plt.xlabel('Number of Points', fontsize=14)
plt.ylabel('Number of Unique Solutions', fontsize=14)
plt.title('Comparison of Methods: Unique Solutions Found', fontsize=16)
plt.xticks(x, point_counts, fontsize=12)
plt.yticks(fontsize=12)
plt.legend(fontsize=14)
plt.grid(True, alpha=0.3)
# Add value labels on top of each bar
for i, v in enumerate(wsm_counts):
plt.text(i - width/2, v + 0.5, str(v), ha='center', fontsize=12)
for i, v in enumerate(ecm_counts):
plt.text(i + width/2, v + 0.5, str(v), ha='center', fontsize=12)
plt.tight_layout()
plt.show()
# Create separate plots for each point count
for count in point_counts:
plt.figure(figsize=(10, 6))
wsm_solutions = wsm_results[count]['solutions']
ecm_solutions = ecm_results[count]['solutions']
wsm_risks = [sol[2] for sol in wsm_solutions]
wsm_returns = [sol[1] for sol in wsm_solutions]
ecm_risks = [sol[2] for sol in ecm_solutions]
ecm_returns = [sol[1] for sol in ecm_solutions]
# Calculate marker sampling for both methods
wsm_n = max(1, len(wsm_risks) // min(50, len(wsm_risks)))
ecm_n = max(1, len(ecm_risks) // min(50, len(ecm_risks)))
plt.scatter(wsm_risks[::wsm_n], wsm_returns[::wsm_n], marker='o', s=100,
label=f'WSM (showing {len(wsm_risks[::wsm_n])} of {len(wsm_risks)} points)',
color='royalblue', alpha=0.8, edgecolors='navy')
plt.scatter(ecm_risks[::ecm_n], ecm_returns[::ecm_n], marker='x', s=100,
label=f'ECM (showing {len(ecm_risks[::ecm_n])} of {len(ecm_risks)} points)',
color='tomato', alpha=0.8)
# Add connecting lines to show the fronts more clearly
wsm_points = sorted(zip(wsm_risks, wsm_returns), key=lambda x: x[0])
ecm_points = sorted(zip(ecm_risks, ecm_returns), key=lambda x: x[0])
if wsm_points:
wsm_x, wsm_y = zip(*wsm_points)
plt.plot(wsm_x, wsm_y, '-', color='royalblue', alpha=0.7, linewidth=2.0)
if ecm_points:
ecm_x, ecm_y = zip(*ecm_points)
plt.plot(ecm_x, ecm_y, '-', color='tomato', alpha=0.7, linewidth=2.0)
plt.xlabel('Risk', fontsize=14)
plt.ylabel('Expected Return', fontsize=14)
plt.title(f'Pareto Front Comparison with {count} Points', fontsize=16)
plt.grid(True, alpha=0.3)
plt.legend(fontsize=12, loc='best')
plt.tight_layout()
plt.show()
# Combined plot with WSM only
plt.figure(figsize=(10, 6))
for count in point_counts:
solutions = wsm_results[count]['solutions']
risks = [sol[2] for sol in solutions]
returns = [sol[1] for sol in solutions]
# Connect points for better visualization
points = sorted(zip(risks, returns), key=lambda x: x[0])
if points:
x_vals, y_vals = zip(*points)
# Use larger, more distinct markers and plot more of them
n = max(1, len(x_vals) // min(60, len(x_vals)))
plt.scatter(x_vals[::n], y_vals[::n], s=150,
label=f'WSM ({count} points, {len(x_vals)} solutions)',
color=colors[count], marker=markers[count], alpha=0.9,
edgecolors='black', linewidth=1.5)
# Use different line styles for different point counts
plt.plot(x_vals, y_vals, color=colors[count],
alpha=0.7, linewidth=2.5)
plt.xlabel('Risk', fontsize=14)
plt.ylabel('Expected Return', fontsize=14)
plt.title('WSM Pareto Fronts Comparison (simplified)', fontsize=16)
plt.grid(True, alpha=0.3)
plt.legend(fontsize=13, loc='lower right')
plt.tight_layout()
plt.show()
# Combined plot with ECM only - 10 vs 1000 comparison
plt.figure(figsize=(10, 6))
# Only compare 10 and 1000 points
comparison_counts = [10, 1000]
for count in comparison_counts:
solutions = ecm_results[count]['solutions']
risks = [sol[2] for sol in solutions]
returns = [sol[1] for sol in solutions]
# Connect points for better visualization
points = sorted(zip(risks, returns), key=lambda x: x[0])
if points:
x_vals, y_vals = zip(*points)
# Use larger, more distinct markers and plot more of them
n = max(1, len(x_vals) // min(40, len(x_vals)))
plt.scatter(x_vals[::n], y_vals[::n], s=150,
label=f'ECM ({count} points, {len(x_vals)} solutions)',
color=colors[count], marker=markers[count], alpha=0.9,
edgecolors='black', linewidth=1.5)
# Use different line styles for different point counts
plt.plot(x_vals, y_vals, color=colors[count],
alpha=0.7, linewidth=2.5)
plt.xlabel('Risk', fontsize=14)
plt.ylabel('Expected Return', fontsize=14)
plt.title('ECM Pareto Fronts Comparison (simplified)', fontsize=16)
plt.grid(True, alpha=0.3)
plt.legend(fontsize=13, loc='lower right')
plt.tight_layout()
plt.show()
Conclusions¶
The Epsilon-Constraint Method (ECM) generally finds more unique Pareto optimal solutions compared to the Weighted Sum Method (WSM), even when WSM uses normalization. This confirms the theoretical advantages of ECM, particularly its ability to find solutions in concave regions of the Pareto front, which WSM may miss.
Key observations from the visualizations:
- ECM produces a much more complete and well-distributed Pareto front
- WSM tends to cluster solutions at the extremes, with gaps in between
- As the number of points increases, ECM continues to find new unique solutions along the entire Pareto front while WSM finds fewer new solutions
- Even with 200 points, WSM still cannot adequately capture the concave regions
Portfolio Game¶
# Define the asset names in the correct order
asset_names_ordered = [
"SuperFuture", "Apples", "WorldNow", "Electronics123", "Photons",
"SpaceNow", "PearPear", "PositiveCorrelation", "BetterTechnology", "ABCDE",
"EnviroLike", "Moneymakers", "Fuel4", "MarsProject", "CPU-XYZ",
"RoboticsX", "Lasers", "WaterForce", "SafeAndCare", "BetterTomorrow"
]
# Combine solutions from both methods for a more comprehensive Pareto front
wsm_solutions = wsm_results[1000]['solutions']
ecm_solutions = ecm_results[1000]['solutions']
all_solutions = wsm_solutions + ecm_solutions
# Analyze the risk and return characteristics of all portfolios
risks = [sol[2] for sol in all_solutions]
returns = [sol[1] for sol in all_solutions]
# Calculate return threshold (median return)
return_50th = np.percentile(returns, 50)
print(f"Return threshold (median): {return_50th:.6f}")
# Find MinRisk portfolio with above-median return
selected_idx = -1
min_risk_value = float('inf')
for i, (weights, port_return, port_risk) in enumerate(all_solutions):
if port_return > return_50th and port_risk < min_risk_value:
min_risk_value = port_risk
selected_idx = i
# Get the selected portfolio
selected_weights, selected_return, selected_risk = all_solutions[selected_idx]
selected_source = "WSM" if selected_idx < len(wsm_solutions) else "ECM"
selected_sharpe = selected_return / selected_risk if selected_risk > 0 else 0
print(f"\nSelected MinRisk Portfolio (from {selected_source}):")
print(f"Expected Return: {selected_return:.6f}")
print(f"Expected Risk: {selected_risk:.6f}")
print(f"Sharpe Ratio: {selected_sharpe:.6f}")
print(f"Strategy: Minimum risk with above-median return")
# Map weights to the correct order
portfolio_weights = {}
for i, weight in enumerate(selected_weights):
portfolio_weights[asset_names[i]] = weight
# Create ordered weights list
ordered_weights = []
for asset in asset_names_ordered:
if asset in portfolio_weights:
ordered_weights.append(portfolio_weights[asset])
else:
ordered_weights.append(0.0)
# Assert weights sum to 1
sum_weights = sum(ordered_weights)
assert abs(sum_weights - 1.0) < 1e-10, f"Portfolio weights must sum to 1.0. Current sum: {sum_weights}"
print(f"Sum of weights: {sum(ordered_weights)}")
# Plot the Pareto fronts with the selected portfolio
plt.figure(figsize=(10, 6))
# Sample solutions for clearer visualization
wsm_sample_rate = max(1, len(wsm_solutions) // min(50, len(wsm_solutions)))
wsm_sampled = wsm_solutions[::wsm_sample_rate]
ecm_sample_rate = max(1, len(ecm_solutions) // min(50, len(ecm_solutions)))
ecm_sampled = ecm_solutions[::ecm_sample_rate]
# Plot each method's front with improved visibility
plt.scatter([sol[2] for sol in wsm_sampled], [sol[1] for sol in wsm_sampled],
c='blue', s=40, alpha=0.6, label='WSM')
plt.scatter([sol[2] for sol in ecm_sampled], [sol[1] for sol in ecm_sampled],
c='green', s=40, alpha=0.6, label=f'ECM')
# Draw lines connecting the points to better visualize the Pareto fronts
wsm_points = sorted([(sol[2], sol[1]) for sol in wsm_solutions], key=lambda x: x[0])
ecm_points = sorted([(sol[2], sol[1]) for sol in ecm_solutions], key=lambda x: x[0])
if wsm_points:
wsm_x, wsm_y = zip(*wsm_points)
plt.plot(wsm_x, wsm_y, '-', color='blue', alpha=0.3)
if ecm_points:
ecm_x, ecm_y = zip(*ecm_points)
plt.plot(ecm_x, ecm_y, '-', color='green', alpha=0.3)
# Highlight the selected portfolio
plt.scatter([selected_risk], [selected_return], c='red', s=200, marker='*',
edgecolors='black', linewidth=1.5, label='MinRisk Portfolio')
# Add a horizontal line at the median return
plt.axhline(y=return_50th, color='grey', linestyle='--', alpha=0.7)
plt.text(min(risks), return_50th+0.01, 'Median Return', fontsize=10)
# Add annotation for the selected portfolio
plt.annotate(f"Return: {selected_return:.4f}\nRisk: {selected_risk:.4f}",
xy=(selected_risk, selected_return),
xytext=(selected_risk+0.0008, selected_return-0.05),
arrowprops=dict(facecolor='black', shrink=0.05),
bbox=dict(boxstyle="round", fc="white", alpha=0.9))
plt.xlabel('Risk', fontsize=12)
plt.ylabel('Expected Return', fontsize=12)
plt.title('MinRisk Portfolio Selection', fontsize=14)
plt.grid(alpha=0.3)
plt.legend(fontsize=10)
plt.tight_layout()
plt.show()
# Create a bar chart of the portfolio weights
plt.figure(figsize=(12, 6))
# Convert to currency
selected_return = 100 + selected_return * 100
# Get non-zero weights and sort by weight
asset_weight_pairs = [(asset, weight) for asset, weight in zip(asset_names_ordered, ordered_weights)
if weight > 0.001]
asset_weight_pairs.sort(key=lambda x: x[1], reverse=True)
sorted_assets = [pair[0] for pair in asset_weight_pairs]
sorted_weights = [pair[1] for pair in asset_weight_pairs]
# Create bar chart
bars = plt.bar(sorted_assets, sorted_weights, color='skyblue', edgecolor='blue')
# Add weight values on bars
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height + 0.005,
f'{height:.2%}', ha='center', va='bottom')
plt.xlabel('Assets', fontsize=12)
plt.ylabel('Weight', fontsize=12)
plt.title('MinRisk Portfolio Asset Allocation', fontsize=14)
plt.xticks(rotation=45, ha='right')
plt.grid(alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
# Save the portfolio to a file
output_filename = "151958_151960.txt"
with open(output_filename, 'w') as f:
f.write(f"{selected_return:.2f} {selected_risk:.6f}")
for weight in ordered_weights:
f.write(f" {weight:.6f}")
print(f"Portfolio saved to {output_filename}")
Return threshold (median): 0.212925 Selected MinRisk Portfolio (from ECM): Expected Return: 0.213186 Expected Risk: 0.004536 Sharpe Ratio: 46.993663 Strategy: Minimum risk with above-median return Sum of weights: 1.0
Portfolio saved to 151958_151960.txt
Our Selected Portfolio¶
We chose a portfolio optimization strategy that balances risk minimization with reasonable returns.
- Expected Return: 121.32 (21.32% expected growth)
- Risk Level: 0.0045 (very low volatility)
- Position on Pareto Front: We identified the crucial "elbow" point where the risk-return trade-off appears most favorable
- Method Used: We found the ECM (Epsilon-Constraint Method) produced better results for our specific criteria
Our Asset Allocation¶
Our analysis led us to a focused investment strategy:
- Photons (52.15%): We chose this as our primary investment due to its excellent risk-adjusted returns
- SafeAndCare (15.97%): We allocated a significant portion here for stability
- WorldNow (11.02%): We included this for additional growth potential
- Diversified holdings: We spread smaller investments across 9 additional assets to mitigate specific asset risks